First we need to explore data to figure out what kind of information available. Once we find that out, we can delve deeper and decide what would be more interesting analyse.
Let’s start by reading the data and check how many rows and columns are available.
## [1] 2383 47
There are 2383 rows and 47 columns. Let’s check the column names.
## [1] "INCIDENT_DATE"
## [2] "INCIDENT_TIME"
## [3] "UOF_NUMBER"
## [4] "OFFICER_ID"
## [5] "OFFICER_GENDER"
## [6] "OFFICER_RACE"
## [7] "OFFICER_HIRE_DATE"
## [8] "OFFICER_YEARS_ON_FORCE"
## [9] "OFFICER_INJURY"
## [10] "OFFICER_INJURY_TYPE"
## [11] "OFFICER_HOSPITALIZATION"
## [12] "SUBJECT_ID"
## [13] "SUBJECT_RACE"
## [14] "SUBJECT_GENDER"
## [15] "SUBJECT_INJURY"
## [16] "SUBJECT_INJURY_TYPE"
## [17] "SUBJECT_WAS_ARRESTED"
## [18] "SUBJECT_DESCRIPTION"
## [19] "SUBJECT_OFFENSE"
## [20] "REPORTING_AREA"
## [21] "BEAT"
## [22] "SECTOR"
## [23] "DIVISION"
## [24] "LOCATION_DISTRICT"
## [25] "STREET_NUMBER"
## [26] "STREET_NAME"
## [27] "STREET_DIRECTION"
## [28] "STREET_TYPE"
## [29] "LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION"
## [30] "LOCATION_CITY"
## [31] "LOCATION_STATE"
## [32] "LOCATION_LATITUDE"
## [33] "LOCATION_LONGITUDE"
## [34] "INCIDENT_REASON"
## [35] "REASON_FOR_FORCE"
## [36] "TYPE_OF_FORCE_USED1"
## [37] "TYPE_OF_FORCE_USED2"
## [38] "TYPE_OF_FORCE_USED3"
## [39] "TYPE_OF_FORCE_USED4"
## [40] "TYPE_OF_FORCE_USED5"
## [41] "TYPE_OF_FORCE_USED6"
## [42] "TYPE_OF_FORCE_USED7"
## [43] "TYPE_OF_FORCE_USED8"
## [44] "TYPE_OF_FORCE_USED9"
## [45] "TYPE_OF_FORCE_USED10"
## [46] "NUMBER_EC_CYCLES"
## [47] "FORCE_EFFECTIVE"
We know data is about police incidents. Here, we can see various information about police and subject are available. We can also that information about use of forces and location of incident is also available are also available. First check data by location.
First checkout if all incidents are in how many cities and state.
## [1] "Dallas"
## [1] "TX"
All the incidents are in Dallas,Texas. Let’s check how many division are available.
## [1] "CENTRAL" "NORTHEAST" "SOUTHWEST" "NORTH CENTRAL"
## [5] "SOUTHEAST" "NORTHWEST" "SOUTH CENTRAL"
There incidents of 7 divisions of Dallas. Let’s find out which division have more incidents.
We can see CENTRAL division has most incidents by some margin. Rest of divisions have similar number of incidents around 250 except for NORTHWEST division which has the lowest incidents about 200.
From the above analysis, it’s clear all the incidents are in 7 divisions of Dallas,Texas.
Let’s check the subject’s race.
## [1] "Black" "Hispanic" "White" "NULL" "Asian"
## [6] "Other" "American Ind"
There are 6 unique races and also some records don’t have race
information. Let’s explore the incident occurrence by subject’s race.
Almost all the incidents are between 3 races Black, Hispanic and White.
Since other races have very few records.It would be better idea to only
explore based on those 3 races.
Let’s explore subject gender.
## [1] "Female" "Male" "NULL" "Unknown"
There are 3 unique gender available and record of gender missing for
few values. Now, checkout occurance of crime based on gender.
Most of the records are from two genders named Male and Female. For further analysis, we can ignore other genders as there so few of them available. It would be hard find meaningful analysis out of so little data.
Quite a few columns are dedicated to force. Let’s explore force a bit. First check reason behind use of force.
## [1] "Arrest" "Danger to self or others"
## [3] "Detention/Frisk" "Weapon Display"
## [5] "Other" "Active Aggression"
## [7] "Assault to Other Person" "Crowd Disbursement"
## [9] "NULL" "Aggressive Animal"
## [11] "Barricaded Person" "Property Destruction"
There are 9 reasons behind use of force. Let’s figure out which are
the most common reason behind use of force.
‘Arrest’ is most common reason for use of force. Almost half of the use of force is due to ‘Arrest’. ‘Active aggression’, ‘Danger to self or others’ and ‘Weapon Display’ other most common reasons for use of force. Since we figured out reason behind use force, now figure out which type of forces used.
## [1] "TYPE_OF_FORCE_USED1" "TYPE_OF_FORCE_USED2" "TYPE_OF_FORCE_USED3"
## [4] "TYPE_OF_FORCE_USED4" "TYPE_OF_FORCE_USED5" "TYPE_OF_FORCE_USED6"
## [7] "TYPE_OF_FORCE_USED7" "TYPE_OF_FORCE_USED8" "TYPE_OF_FORCE_USED9"
## [10] "TYPE_OF_FORCE_USED10"
From column names, we could see at most 10 types of forces used for single incidents. Let’s explore a bit on multiple forces columns.
## [1] "Hand/Arm/Elbow Strike" "Joint Locks"
## [3] "Take Down - Group" "K-9 Deployment"
## [5] "Verbal Command" "Hand Controlled Escort"
## [7] "Weapon display at Person" "Held Suspect Down"
## [9] "BD - Grabbed" "BD - Pushed"
## [11] "Handcuffing Take Down" "Taser"
## [13] "Take Down - Arm" "Other Impact Weapon"
## [15] "Take Down - Head" "Feet/Leg/Knee Strike"
## [17] "Pressure Points" "Taser Display at Person"
## [19] "BD - Tripped" "Take Down - Body"
## [21] "Leg Restraint System" "OC Spray"
## [23] "Pepperball Impact" "Combat Stance"
## [25] "Baton Display" "Baton Strike/Open Mode"
## [27] "Baton Strike/Closed Mode" "LVNR"
## [29] "Pepperball Saturation"
From first column of types of forces used, we could see 29 types of forces used.
‘Verbal Command’ and ‘Weapon display at Person’ is the most common form of force used in initial stage. Let’s see on later stages what type of forces used.
## [1] "" "Take Down - Arm"
## [3] "Hand Controlled Escort" "Leg Restraint System"
## [5] "BD - Tripped" "Held Suspect Down"
## [7] "Verbal Command" "Weapon display at Person"
## [9] "Pressure Points" "BD - Grabbed"
## [11] "Taser" "Hand/Arm/Elbow Strike"
## [13] "Take Down - Head" "Feet/Leg/Knee Strike"
## [15] "BD - Pushed" "Taser Display at Person"
## [17] "Joint Locks" "Take Down - Group"
## [19] "Take Down - Body" "OC Spray"
## [21] "Handcuffing Take Down" "Baton Strike/Open Mode"
## [23] "Pepperball Saturation" "Baton Display"
## [25] "K-9 Deployment" "Combat Stance"
## [27] "Other Impact Weapon"
There is 27 types of secondary forces used. Let’s find out which forces used most.
Unsurprisingly, most common use of force is ‘no force’. So, that means after initial force, no force was needed. ‘Verbal Command’ and ‘Held Suspect Down’ is the most common secondary force used.
From the above exploration of data, we can understand that all policing incidents are of Dallas,Texas. It has details of subject race and gender. It also contains area details and forces used or not. If any force used how many types of force used. It also got time of the incidents which we have explored that much due to formatting of time.
In the next steps, we will reformat time and we will mainly focus on incident over time, subject by gender and race and force usage on subject. How many types of forces used is also an interesting data to be determined. We will try to explore and analyse gender, race and subject and find relationship between them.
Let’s reformat time and explore incident time bit more.
## [1] "16"
We can see all the data are from year 2016. So, year by year analysis is not feasible. We will focus more on monthly, weekly and hourly incident occurrence.
Let’s checkout day by day incidents over the year 2016 and smooth the line for ease visualization
From the distribution we can see the crime over the whole year. We can see decrease in incident rate at the end of the year. While it peaked around the March. Incident rate seems to between 4-25 per day. Let’s explore bit more.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 4.000 6.000 6.751 9.000 24.000
We can see median incident being 6 and mean being 6.751. Maximum incident in one day is 24 and minimum being 1. However there there could be days where no incident happened which is oviously not in our database. We will explore that bit later.
We can see from the box plot that there is no outlier on the lower end but there are few outlier on the upper end. Most of the cases it will not have huge impact on the data.
From the density plot, we can see most common occurrence of crime is 3 to 5 per day. There are very few value at the higher end of the distribution.
Let’s find out which month have the most incidents and see if there is any pattern.
Month by distribution of crime made our previous assumption more clear that in february and march crime peaked while at the end of the year it decreased. This is however is no pattern as it is only one year data. So, it is hard to make yearly assumption out of it.
From day to day crime distribution it is clear there are some day without any crime like december 4th. Most of the higher value days are in the first few months and blank and lower value days are at the last few months which is quite normal considering previous monthly plots.
Check incidents by weekdays and look if weekend has any effect on the
crime level
From the weekday crime occurance column it seems to be indicating weekends are most crime prone. Friday as well has more crimes than other weekdays. It could be because of friday night party as a start of weekend. Overall sunday has the most incidents. However, saturday and sunday incident could be more because of few days with more incidents and thus increase the percentage of the incidents.
From the weekday distribution it is clear most Friday, Saturday and Sunday has more crime than other days of the week. It is consistant over the year.
Let’s figure out which hour of the day got most incidents
As we can see incidents occurred more at night. From 5pm to 9pm is the more the more crime prone. However this could increased by some specific occurance of major incidents in that time period. Let’s explore it more.
We divide the data by incidents in a day by hour. We set low alpha value to determine overlap. From the above graph it is more clear that incidents in the night is not random incidents. It is more during 5pm to 9pm as overlapping made it more solid color as well as we see higher incidents in that interval.
Let’s explore subject race
## cat_var
## American Ind Asian Black Hispanic NULL Other
## 0.04196391 0.20981956 55.93789341 21.98908938 1.63659253 0.46160302
## White
## 19.72303819
There are 3 main races in the subject. ‘Black’ being the majority, followed by ‘Hispanic’ and ‘White’. Let’s ignore the other races for now.
Crime pattern seems similar over the day for all the three races. White and Hispanic have similar incident rate. On the other hand, Black have more incidents throughout the days. Let’s check incidents by weekdays with different race.
‘Black’ subject tends to commit more crime on Friday. ‘Hispanic’ subject tend to commit more crime on Sunday. Overall, all races commit more crime on weekends and Friday. However, ‘Hispanic’ subjects crime on Sunday is too much compared to other days. Let’s explore more.
Let’s check weekdays with median value to make sure it did not affect
because of outliers
We are looking into median incidents by race. Sunday behaviour still holds for ‘Hispanic’ subjects. ‘Black’ subjects are still commit more crimes.
Let’s draw map to see any pattern over the race according to area of living
From the data we can observe that white subject incidents are more common in centre of the map and on the upper portion. There are low crimes on lower portion of the map commited by white subjects. On the other hand black subject crimes are more common in the centre and lower part of the map. Hispanic crimes are spread all over the map. But, it got little bit more crime on the left side of the map. Overall centre of the map got more crime followed by lower left portion. Upper left portion of the map contains least crime.
Overall APOWW is the most common incident followed by No Arrest and Public intoxication. Warrant and Assault are the other common incidents. White subject have bigger portion in public intoxication compared to other race. On the otherhand Black race have bigger portion for warrant and APOWW. Hispanic have bigger portion on No Arrest. When we mention bigger portion it is compare to their size in the database.
Let’s take a deeper look into Black subject crimes by Gender. APOWW and No arrest are the most common incidents among black subject. Other then that Warant, public intoxication and Assault are quite common. For APOWW, we can see more female has bigger portion compare to other crime.
Let’s take a deeper look into Hispanic subject crimes by Gender. No arrest, APOWW and Public intoxication are the most common incidents among hispanic subject. Other then that Warant, public servant and Assault are quite common. For APOWW, we can see more female has bigger portion compare to other crime.
Let’s take a deeper look into White subject crimes by Gender. APOWW, Public intoxication and No arrest are the most common incidents among White subject. Other then that Warant, public servant and Burglary are quite common. For APOWW and public intoxication, we can see more female has bigger portion compare to other crime.
We explore crime by subject gender and race. How much crime are committed by each gender. Which type of crime is committed more by which gender.
All types of incidents are committed more by Male. ‘APOWW’, ‘NO Arrest’ and ‘Public Intoixation’ are most common incidents. Most common incident by ‘Male’ is ‘No Arrest’ and followed by close second ‘APOWW’. On the other hand, most common incident by ‘Female’ is ‘APOWW’ by quite margin. It is actually fewfold more than second most common ‘No Arrest’.
All incidents are spread evenly regardless of their gender. There is no specific area where it seems ‘Male’ or ‘Female’ committed more crime. In the centre, more crimes are commited both gender.
Top five most common incidents are ‘No Arrest’, ‘APOWW’, ‘Public Intoxication’, ‘Warrant/Hold’ and ‘Assault/FV’ respectively and all of those commited most by ‘Black’ subject. Almost all the incidents are done more by ‘Black’ male. ‘No Arrest’, ‘Public Intoxication’ and ‘Warrant/Hold’ all committed more by ‘Hispanic’ than ‘White’. On the other hand, ‘APOWW’ and ‘Assault/FV’ are committed more by ‘White’.
Top five most common incidents are ‘APOWW’, ‘No Arrest’, ‘Public
Intoxication’, ‘Assault/FV’ and ‘Warrant/Hold’ respectively and all of
those commited most by ‘Black’ subject except for ‘Public Intoxication’.
‘Public Intoxication’ is done more by ‘White’ by fewfold. Most of the
incidents are done more by ‘Black’ female. ‘Public Intoxication’ and
‘APOWW’ committed more by ‘White’ than ‘Hispanic’. On the other hand,
‘No Arrest’, ‘Warrant/Hold’ and ‘Assault/FV’ are committed more by
‘Hispanic’.
Let’s reformat force data and find out how many types of force used for incidents. Also checkout number of forces by gender and race.
##
## 1 2 3 4 5 6 7 8 10
## 747 763 486 230 96 39 17 4 1
2 types of forces used is most common. Followed closely by 1 type of forces. After that, incident count have inverse proportional relationship with number of forces used. 8, 9 and 10 types forces used rarely. That makes perfect sense.
For ‘Black’ subject 2 type of force use is more common than 1 type of force use. For ‘White’ and ‘Hispanic’ subject, 1 type of force is more common. Other than that, it always maintained overall pattern of number of force inverse proportional to incident count.
For ‘Female’, 2 and 3 type of force use is more common than 1 type force. For ‘Male’, 1, 2 and 3 type of force use is more common respectively. Other than that, it always maintained overall pattern of number of force inverse proportional to incident count.
Database: https://www.kaggle.com/datasets/center-for-policing-equity/data-science-for-good
Source Code: https://github.com/rifat1234/Dallas-Texas-Policing-Data-Analysis
https://www.kaggle.com/code/shivamb/4-3-analysis-report-officer-level-analysis
https://www.kaggle.com/code/yashedpotatoes/tidying-acs-data-in-r-and-python
https://www.kaggle.com/code/araraonline/austin-use-of-force-eda
https://www.kaggle.com/code/vincentkr18/eda-time-series-analysis-policing-equity
https://epirhandbook.com/en/ggplot-basics.html
http://lab.rady.ucsd.edu/sawtooth/business_analytics_in_r/Viz1.html